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Abstract — This paper summarizes some of the major cal- 
ibration and image reconstruction techniques used in radio 
interferometry and describes them in a common mathematical 
framework. The use of this framework has a number of benefits, 
ranging from clarification of the fundamentals, use of standard 
numerical optimization techniques, and generalization or special- 
ization to new algorithms. 

Index Terms — radio interferometry, calibration, imaging, algo- 
rithms, computing. 



I. Introduction 

THE theory and practice of radio interferometry, includ- 
ing data processing, is well-advanced and has been the 
subject of a graduate level textbook [1|. This book is rec- 
ommended for the detailed descriptions of the fundamentals. 
In this paper, we aim to summarize recent advances in the 
theory and practice of calibration and imaging, arising from 
the work of several of the authors over the past ten years. 
We draw upon a number of our papers, placing the results 
in a common framework and nomenclature. We also present a 
number of new insights and algorithms arising in recent work. 

The last decade has seen a substantial growth in the number 
and diversity of radio synthesis telescopes being constructed 
Examples include the Expanded Very Large Array (EVLA, 
0, the Low Frequency Array (LOFAR, Q), the Square 
Kilometre Array (SKA, [4]), the Australian Square Kilometre 
Array Pathfinder (ASKAP, [5 1) and the Karoo Array Telescope 
(MeerKAT [6|). These telescopes bring both new science and 
new technical challenges. Prime amongst these challenges are: 

• Theory to describe new observing modalities and previ- 
ously ignorable effects, 

• Algorithms to solve the resulting equations, 

• A required increase in algorithmic performance in terms 
of sensitivity and dynamic range, 

• A large increase (hundreds or thousands) in data volume, 

• The need for algorithms adapted to high performance 
computing, particularly the shift to highly parallel or 
concurrent processing. 

The concept of a measurement equation is key to our work. 
Hamaker, Bregman, and Sault [7] were particularly notable in 
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emphasizing the importance of a single equation to describe 
a measurement process (as opposed to, say, a set of loosely 
related equations). 

Section [TT] describes the measurement equation in radio 
interferometry. Section [III] describes the solution of the mea- 
surement equation as an optimization problem and describes 
standard algorithms and methods used to solve it - calibration 
of direction independent instrumental effects and imaging 
using a point-source flux model. Section ITVl describes recent 
advances in algorithms that account for direction dependent 
instrumental effects during imaging. Section [V] describes re- 
cent advances in deconvolution algorithms. 

II. Measurement Equation in Radio 
Interferometry 

Aperture synthesis is an indirect imaging technique where 
the spatial Fourier transform of an image is measured via its 
mutual coherence function. A radio interferometer [ 8 1 consists 
of a collection of spatially separated antennas. The aperture 
plane of the interferometer is the plane perpendicular to the 
instantaneous direction from the array to a reference point 
on the sky sq called the phase-reference center. A baseline 
bij is defined as the vector between the 3D locations of two 
antennas i and j, projected onto this aperture plane. The 
components of bij are measured in units of wavelength A 
and denoted as u,v,w where u, v are 2D spatial frequencies 
and w describes the height of an antenna relative to the 
plane of the array in the direction of sq. For electromagnetic 
radiation from a spatially incoherent brightness distribution, 
the mutual coherence function is defined as the time averaged 
cross correlation product of the total electric field measured at 
two aperture points (antennas) with a time delay between the 
measurements, and is given by 



r(6) 



E(s,t) ■ E*(s,t-b- s/cj) e 



-2mb 



s/x <m (i) 



where s = sq + a describes a point near the phase reference 
centre, E(s, t) is the complex amplitude of the radiation 
emanating from a source in the direction s, b ■ s/c is the time 
difference between the incoming radiation collected at two 
antennas separated by b, and dfl = ds/R 2 where R is the 
distance between the source and the aperture plane. 

Signals from all antennas are delay corrected by a com- 
mon factor given by b ■ sq/c, to steer the array towards 
so. If the maximum remaining delay b ■ a/c is smaller than 
the signal coherence time, the term in the angle brackets 
becomes the source autocorrelation function or the three- 
dimensional source brightness distribution 1(1, m,n), where 



IEEE SPECIAL ISSUE ON ADVANCES IN RADIO TELESCOPES (IN PRESS) 



2 



l,m,n — VT — I 2 — m? are direction cosines describing a. 
Eq. Q] becomes 

V(u,v,w)= ( J( *' ™' n) e- 2 ^ ul+vm +< n -^dldm (2) 
J n 

When the array is coplanar (w w 0), or the region of the 
sky being imaged may be assumed fiat (n w 1), Eq. [2] 
describes a 2D spatial Fourier transform relation between the 
mutual coherence function and the source brightness. This is 
the Van Cittert Zernike theorem [ 1 ] and forms the basis for 
interferometric imaging. 

To measure polarised radiation Q, two nominally orthog- 
onal components of the incident electric field E{ = [E p E q ]f 
are measured at each antenna i. Four cross-correlation pairs 
(two cross-hand and two parallel-hand) are formed per base- 
line as (Ei ® Ej), The resulting coherence vector is denoted 
as Vij = [V pp V pq V qp V qq }fj. The vector of images corre- 
sponding to the four correlations is I = [I pp I pq I qp I qq ] T 
and is related to the standard Stokes vector of images by a 
linear transform. 

The measured incoming radiation is modified by propaga- 
tion effects and receiver electronics. Jones matrices describe 
this modulation for the electric field incident at each orthog- 
onal pair of feeds Ei — [E p E q ]f . Direction independent 
effects for antenna i are described as J'" ls — [GDC], a 2 x 2 
matrix product of complex antenna gains (G), polarisation 
leakage {D) and feed configuration (G). Direction dependent 
effects are described by J? y = [EPF], a product of antenna 
illumination patterns (E), parallactic angle effects (P) and 
tropospheric and ionospheric effects and Faraday rotation 
(F). The effect on each baseline ij is described by the 
outer-product of these antenna-based Jones matrices given by 
K {vis,sky} = ^ j, ^ j^{ vis , s ky} t a 4 x 4 matr i x . (T n this paper, 

the f superscript is used to denote conjugate transpose or 
operator adjoint.) 

The measurement equation [ 8 1 for one baseline (spatial fre- 
quency), one frequency channel, and one integration timestep, 
is given by 

V° bs = [KVj s ] J [K^ y ]I sk y{s)e- 2 ^- 3 ' x dn (3) 

All instrumental and propagation effects described by Kij 
need to be corrected during image reconstruction. 

So far, we have dealt with the signals measured at only 
one baseline. With n an t antennas, there are n an t(n an t — l)/2 
baselines that make simultaneous measurements at multiple 
spatial frequencies. The spatial frequency plane can be further 
sampled by varying the positions of the antennas with respect 
to the direction of the phase-reference center. For ground- 
based arrays, the Earth's rotation makes all projected baseline 
vectors b ■ so trace ellipses on the spatial frequency plane, 
slowly filling it up. Measurements at multiple receiver fre- 
quencies also increase the sampling of the spatial-frequency 
plane. Measurements must be made at sufficiently high time 
and frequency resolution, to prevent smearing (averaging of 
visibility data) on the spatial frequency plane. The result is 
generally a centrally dominated uw-plane sampling pattern 
with a hole in the middle and tapered outer edges. This is 



the transfer function of the synthesis array and is called the 
uv-coverage (see [8|). 

The complete measurement equation can be written in 
matrix notation to include the effect of the uv-coverage. Let 
^mxi be a pixelated image of the sky and let V° x \ be a 
vector of n visibilities. Let S n xm be a projection operator 
that describes the uv-coverage as a mapping of in discrete 
spatial frequencies (pixels on a grid) to n visibility samples 
(usually n > m). Let F mxm be the Fourier transform operator 
and c be the number of measured correlations (1, 2 or all 4 of 
{pp,pq,qp,qq}). The measurement equation in block matrix 
form is 

Knxl [^c7ixcn][^cnxcni][Ecrnxcrn][E^ crnxcm ]I cnlx i (4) 

Writing this completely in the spatial frequency domain, 

KinXl = [K cn xcn][Scnxcm][G C mxcm]V cn {xl (^) 
Where [G cmX cm] = [fcm X cm] [^"cmx cm] [-^^ cmxcm] IS a 

convolution operator in the spatial frequency domain with 
[FK sky ] as the convolution filter. 

All discussions that follow are of numerical algorithms, 
described within a mathematical framework amenable to im- 
plementation using standard optimization software. 

III. Standard Calibration and Imaging 

This section describes the solution of the measurement 
equation as a numerical optimization problem. The measure- 
ment equation for a single correlation with no direction- 
dependent terms is given as 

Vnxl = KxJ[^"Xm-fmxm]-f ra xi (6) 

Consider only the pp correlation product, and let the complex 
gains per antenna i be given by [G,] = gf. Then, K^ s = 
Gi ® Gj = g p g* p is a scalar and [K^ xn ] is a diagonal matrix. 

The unknowns in Eq. [6] are the sky brightness I sky and 
the complex gain product for all visibilities K ms . A two- 
stage x 2 minimization process iterates between these two 
parameter subspaces and applies constraints appropriate to 
the different physics involved. Calibration (section IIII-Al i is 
the process of computing and applying the inverse of [X OTS ]. 
Imaging (section IIII-BI ) is the process of reconstructing the 
sky brightness I sky by removing the effect of the instrument's 
incomplete spatial frequency sampling. 

1 A l-D convolution operator is constructed as follows. Consider a * b. 
Let [A] and [B] be diagonal matrices constructed from vectors a and b 
respectively. Then, A*B = F^(FA)(FB) = [F^A F F]B = [C][B]. Here, 
diag(Ap) is the discrete Fourier transform (DFT) of a, and [C] is a Toeplitz 
matrix, with each row containing a shifted version of a. Multiplication of [C] 
with b implements the shift-multiply-add sequence required for the process 
of convolution. Since F is unitary, the singular-value-decomposition of [C] 
is given by the Fourier transform, making it circulant. For a 2D convolution, 
[F] is the outer product of two l-D DFT operators and [C] is block-circulant 
with circulant blocks. 
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A. Calibration 

The elements of [isf OTS ] are first estimated from observations 
of a source whose structure is known a-priori (V™i ) by 
solving Eq. in the form 

xrobs r r^vis iT/model sn\ 

»nxl ~ l A nxn] v nxl \') 

A weighted least squares solution j9] of Eq. [7] is found by 
minimizing x 2 = J2 t j w ij\ V ij bs ~ 9i9*jVij° del ? where Wij is 
a measured visibility weight (inverse of noise variance) and 
y model p rov ides 0(n 2 ant ) constraints to uniquely factor the 
baseline-based K ms into n ant antenna-based complex gains. 
In cases where the measurements at each baseline contain 
random additive noise that cannot be factored into antenna- 
based terms (closure noise), a baseline-based calibration is 
sometimes done to solve for the elements of [K Vls ] directly. 

[i4T"xn] _1 i s reconstructed from these solutions, and applied 
to the observed visibilities to correct them. 

vz\ r = ferXxi (8) 

To increase the signal to noise ratio of correlations going 
into the algorithm, the visibility data are sometimes pre- 
averaged along data axes over which the solution is likely 
to remain stable. 

B. Imaging 

Using Eqs. [6] and [8] the measurement equation after cali- 
bration is 

[SnxmF mX m]I m xl = Kixl" (^) 

A weighted least squares estimate of I sky is found by solving 
the Normal Equations 

[F^WSF]I s *l x = [F*&W\V%X (10) 

where W nxn is a diagonal matrix of signal-to-noise based 
measurement weights and denotes the mapping of mea- 
sured visibilities onto a spatial frequency grid. 

The Hessian [F^S^WSF] on the LHS of Eq. [10] describes 
the imaging properties of the instrument, and the RHS de- 
scribes the raw image produced by direct Fourier inversion 
of the calibrated visibilities. When V™? = l nx i, the RHS 
gives the impulse response function or point spread function 
of the instrument (I ps f), defined as the image produced 
when observing a point-source of unit brightness at the phase 
center. The Hessian is by construction, a circulant convolution 
operator with a shifted version of I ps f in each row. Therefore, 
the dirty image produced by direct Fourier inversion of the 
measurements is the convolution of the true image 

pky with 

the PSF of the instrument and the Normal equations can be 
solved via a deconvolution. 

Since S represents an incomplete sampling of spatial fre- 
quencies (column rank of S nX m is < m X the Hessian is 
singular. Therefore although this convolution is a linear oper- 
ation, the Hessian cannot be directly inverted to create a linear 
deconvolution operator. Instead, an iterative Newton-Raphson 
approach is implemented as follows. 

(a) Initialise the model image I™ to zero or to a model that 
represents a-priori information about the true sky. 



(b) Major Cycle : Compute the \/x 2 (residual) image. 

r es = {[ F isiw][V corr - [SF\I? )]} (11) 

The forward transform V m = [SF]I™ predicts visibilities 
that would be measured for the current sky model and 
residuals are computed as V res = V corr — V m . 
The reverse transform I res — [F^S^W] V res computes 
an image from a set of visibilities. A Preconditioning 
scheme decides how best to weight the visibility data (see 
IIII-B 1 b before Gridding them onto a regular grid of spatial 
frequencies (see lHI-B2l and Fourier inverting to give I res . 

(c) Minor Cycle : Compute the update step by applying an 
operator T to the \/x 2 image. Update the model image. 

i™i = i™ + T{r es j psf ) (12) 

T represents a non-linear deconvolution of the PSF from 
I res while filling-in unmeasured spatial frequencies (null 
space of the measurement matrix) for a complete recon- 
struction of the image. Section IIII-B3I describes T for 
several standard deconvolution algorithms. 

(d) Repeat from (0 until convergence is achieved (I res is 
noise-like) or other termination criteria are satisfied (T 
can no longer reliably extract any flux from I res ). 

(e) The final I m is restored by first smoothing it to the 
maximum angular resolution of the instrument to suppress 
artifacts arising from unconstrained spatial frequencies 
beyond the measured range and then adding in the final 
T res to preserve any undeconvolved flux. 

1 ) Preconditioning: The aim of preconditioning is to alter 
the shape of the PSF according to whatever makes the Normal 
equations easier to solve. This is done by re-weighting the 
data to tune the instrument's sensitivity to a particular type of 
source and signal-to-noise ratio [8]. 

The Natural Weighting scheme gives equal weight to all 
samples and preserves the instrument's peak sensitivity, mak- 
ing it ideal for the detection of low signal-to-noise sources. 
However, the non-uniform sample density on the uv-grid can 
give the PSF a wide main lobe and high sidelobes. Uni- 
form Weighting gives equal weight to each measured spatial 
frequency irrespective of sample density and this lowers its 
peak sensitivity. The resulting PSF has a narrow main lobe 
and suppressed sidelobes across the entire image and is best 
suited for sources with high signal-to-noise ratios to mini- 
mize sidelobe contamination between sources. Super-Uniform 
Weighting gives a PSF with inner sidelobes suppressed as in 
Uniform weighting but far-out sidelobes closer to that with 
Natural weights. The peak sensitivity is also closer to Natural 
weighting. UV Tapering suppresses high spatial frequencies 
and tunes the sensitivity of the instrument to peak for scale 
sizes larger than the resolution element. Robust Weighting 
ifTOl creates a PSF that smoothly varies between Natural and 
Uniform weighting based on the signal-to-noise ratio of the 
measurements and a tunable parameter that defines a noise 
threshold. 

The final imaging weights are given as W lm = 
W pc W where W pc are preconditioning weights and W are 
measurement-noise based weights. The Hessian becomes a 
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convolution operator with the preconditioned PSF in each row 
(IP»f = diag[F^S^W ml ]). 

2) Gridding: The measured visibilities sample the spatial 
frequency plane along elliptical tracks and need to be binned 
onto a regular grid of spatial frequencies so that the FFT algo- 
rithm can be used for Fourier inversion. Gridding interpolation 
is done as a convolution |8|. Each weighted visibility is first 
multiplied with a prolate spheroidal function P s centred at 
its true location. Then, values at the centres of all grid cells 
within a certain radius are read off. P s acts as an anti-aliasing 
function. A grid correction is then done in the image domain 
to remove this multiplicative image-domain effect. 

Let P 8 be a diagonal matrix representing the prolate- 
spheroidal function. G gc = [F(F'P S )F'] is the correspond- 
ing gridding convolution operator in the spatial frequency 
domain, equivalent to multiplying the image domain by 
I go = W Ps]mxm- The normalized dirty image and PSF are 
computed as 

I { m ti tV ' PSf} = w^ m [I^]-^G^S^W im ]V^°r 1} d3) 

where division by w sum — trace(W' im ) normalizes the 
peak of the PSF to unity. Eq. [13] describes the practical 
implementation of the reverse transform of the Major Cycle 
and I dvrt v is the initial I res used to start the iterations. 

The model image p aodel obtained at the end of each Minor 
Cycle is used in the forward transform as 

The calculation of these transforms involves traversals of the 
entire set of visibility data making it computationally expen- 
sive. Deconvolution algorithms usually tailor the frequency 
of Major and Minor cycles to perform trade-offs between 
performance, accuracy and total number of iterations. 

3) Deconvolution: For the Minor Cycle, I dlrt y is assumed 
to be a perfect convolution of the PSF with the true sky bright- 
ness, where ^rt y , ps f} 

are given by Eq. [13] The operator T 
in Eq. [T2] constructs a model image I m via a deconvolution. 

The CLEAN algorithm forms the basis for most deconvolu- 
tion algorithms used in Radio Interferometry. The peak of the 
residual image gives the location and strength of a potential 
point source. The effect of the PSF is removed by subtracting 
a scaled I ps f from I res at the location of each point source 
and updating I m (Eq. [I2l , Many such iterations of finding 
peaks and subtracting PSFs form the Minor Cycle. 

The following deconvolution algorithms model the sky in 
a pixel basis and are best suited to isolated point sources 
whose amplitude is constant across the observing bandwidth. 
Deconvolution algorithms that produce multi-scale and multi- 
frequency source models are described in section [V] 

In Hogbom CLEAN [11], the Minor cycle subtracts a scaled 
and shifted version of the full PSF to update the residual 
image for each point source. Only one Major cycle is done. 
It is computationally efficient but susceptible to errors due to 
inappropriate preconditioning. Clark CLEAN [ 12] does a set of 
Hogbom Minor cycle iterations using a small patch of the PSF. 
A Major Cycle is performed when the brightest peak in the 
residual image is below the first sidelobe level of the brightest 
source in p es . The residual image is then re-computed as 



I res = [F^](FI dlrt y - FI m ) to eliminate aliasing errors. 
Cotton-Schwab CLEAN lfT3l is similar to the Clark algorithm, 
but computes the residual as I res = [F* S^W](V corr - 
[SF]P n ). It is time consuming but relatively unaffected by 
inappropriate preconditioning and gridding errors because it 
computes \ 2 directly in the measurement domain. It also 
allows highly accurate prediction of visibilities without pix- 
elation errors. The Steer-Dewdney-Ito CLEAN Minor Cycle 
finds the locations of sources by setting an amplitude threshold 
to select pixels. The combined set of pixels is then convolved 
with the PSF and subtracted out via a Clark Major Cycle. This 
algorithm is more suited to deconvolving extended emission. 
Maximum Entropy (MEM) 1 14] methods and Non negative 
least squares (NNLS) lPIOI . lfl5l are pixel-based deconvolution 
algorithms that perform a rigorous constrained optimization 
in a basis of pixel amplitudes. MEM solves a least squares 
problem with a penalty function based on image entropy, 
that biases the estimate of the true sky brightness towards 
a known prior image. NNLS deconvolution solves a least- 
squares problem with linear inequality range constraints for 
all its parameters. 

IV. Calibration and Imaging with Direction 
Dependent Instrumental Effects 

Klj y in Eq. [3] represents the effects of direction dependent 
(DD) gains in the measurement from a single interferometric 
baseline. These DD gains can result from a number of in- 
strumental and atmospheric/ionospheric effects, are potentially 
different for each baseline, and can be a function of time, 
frequency, polarisation and direction. In the simplest form 
of this equation these dependencies can be ignored, making 
K*j V purely multiplicative in the image domain. Imaging 
can then proceed as described in section [Hi] (correcting only 
for direction independant terms), with the final image being 
divided by an estimate of K sky to remove the multiplicative 
DD effects. 

In its general form, Eq. [5] in the presence of DD effects for 
a telescope calibrated for [A" lns ] can be written as 

Kixi = [Snxm] [G^ xm ] V m ^ 1 (15) 

Each row of [G^ xm ] acts as a visibility -plane filter (see 
footnote[TJi for the measurements from baseline ij, and is given 
by [Gff]i xm = diag([FK^ y } mxm ), where K-j V is assumed 
to be known from a-priori information. Note that K^ y can 
also be separated into antenna based terms. We will exploit this 
property in Section HV-B3l to devise efficient solvers to solve 
for parametrized forms of [G dd ] for unknown DD effects. 

Equations [5] and Q3] suggest the use of FFT-based forward 
and reverse transforms to account for DD effects using an 
appropriately constructed G dd operator. Data prediction can 
incorporate DD effects by using G dd as the operator for re- 
sampling data from a regular grid (FFT of the model image) 
at points given by the operator S. The reverse transform can 
correct for DD effects by using the conjugate transpose of 
G dd along with the standard anti-aliasing operator G gc for 
gridding the data (see Section Ull-B2l >. For such a transform to 
efficiently correct for DD effects, the G dd filter must satisfy 



IEEE SPECIAL ISSUE ON ADVANCES IN RADIO TELESCOPES (IN PRESS) 



5 



two properties: (1) it should have a finite support size (i.e., 
corresponding K sky should be band-limited), and (2) it should 
be a unitary operator (or approximately so). Effects of the W- 
term and antenna primary beam patterns are two examples of 
DD effects, whose operators have these desirable properties. 

A generalized version of Eq. [13] including the DD effects 
can be written as 



j{dirty,psf} _ [J™t] ~ 1 ]J wt ] ~ 1 J^rt Q9 c Q dd W 



ijy{corr,l} 

(16) 



where 



TWt 

1 dd 



= [F^G dd1 W im G dd F] 



(17) 
(18) 



In the absence of DD effects, G dd is an identity matrix, If?} = 
w sum [l mxm ] and Eq. [TBI reduces to Eq. [T3] I™* is the same 
as the grid correction mentioned in section IIII-B2I to correct 
for the image plane effects of the anti-aliasing operator P s . 
Three special cases are discussed in the following sections. 
1) When G ddi G dd is an identity matrix, is still w sum 
and Eq. [16] can be used to generate l{ dlTty } free of the 
relevant DD effects. The effect of the W-term discussed 
in section IIV-AI corresponds to this case. 



2) When Gff Gff 



is a time dependent function, 7^* 



K sky1 K sky . DD effects due to time varying 
antenna primary beams represent an example of this case, 
as is discussed in section IIV-BI 
3) Mosaic imaging or single pointing imaging with het- 
erogeneous antenna arrays corresponds to case where 
Gff Gff is not the same for all i and j. This is discussed 

LI LI 



in section ITV-CI 

A. Correction for the W-term 

The W-term is related to the fact that Eq. [TJ holds for 
coherence between the E-field measured at two points on a 
common constant phase front of the incident radiation |16|. 
This is true only when the array is coplanar, and the source 
being tracked is at the local zenith ifTTl . Therefore in general, 
the image and visibility planes are not related by a 2D Fourier 
transform. The use of the 2D FFT for imaging wide-fields, 
results in a PSF which is no longer shift-invariant, making 
standard deconvolution algorithms unsuitable. However, if a 
Fresnel diffraction kernel is used as a propagator ifTSl to 
compute the E-field measured at one of the antennas of 
each baseline, the 2D Fourier relation can be preserved. This 
propagator is equal to the Fourier transform of the W-term 



Eq. (, 



). Two algorithms commonly used to 



correct for the effects of the W-term are described below. 

1 ) Faceting algorithms: The effect of the W-term is small 
close to the phase tracking center. This property is exploited 
by algorithms which divide the field of view into a number of 
facets. Images are made by either projecting the facet images 
onto the local tangent plane (image-plane faceting |19|) and 
using the appropriate PSF for the deconvolution of individual 
facet images, or by projecting the (u, v) for each facet onto 
a single tangent plane in the gridding step required for an 
FFT-based reverse transform [20|. This latter method produces 



a single flat image and has several run-time and imaging 
performance benefits ET\ . 

2) W-Projection algorithm: In Eq. [15] the operator \G dd ] 
can be used to account for the W-term by choosing v = 
e w ij (y/i-^-m"-i) i This w . term operator G dd is strictly uni- 
tary (by construction) and has a finite support (due to the 
anti-aliasing operator G 9C ). It will therefore correct for the 
W-term during image deconvolution 112211 . ll2TI . Conservatively 
speaking the W-Projection algorithm is about an order of 
magnitude faster than faceting, and for the same amount of 
computing time can deliver higher dynamic range images J2TJ. 

B. Correction for Primary Beam 

With the increased instantaneous sensitivities of next gen- 
eration telescopes and long integrations required for high 
dynamic range imaging, antennas can neither be considered 
identical nor stable as a function of time. Therefore, next 
generation imaging algorithms need to include corrections for 
the effects of time-varying antenna primary beams 1231 . 112411 . 
Algorithms to correct for these effects can be broadly classified 
into two categories, namely corrections in the image plane 
versus corrections in the Fourier plane. 

1) Image plane correction: When K'j y is different for 
each baseline, one approach for correcting DD effects is the 
direct evaluation of the integral in Eq. [3] for the forward and 
reverse transforms during iterative image deconvolution l25ll . 
The resulting run-time load for realistic data sizes can however 
be prohibitive. To reduce the compute load some-what, an 
FFT based reverse transform (section IIII-B2I) is used, but this 
requires making assumptions about the variability of either the 
sky emission or the antenna power pattern. 

2) Fourier plane correction - The A-Projection algorithm: 
The visibility-plane filter describing the effects of the antenna 
primary beams is the auto-correlation of the antenna aperture 
illumination function. For a finite sized antenna, this clearly 
has a finite support in the Fourier domain. However the 
resulting effective operator (FG d 



*) is only approxi- 
mately unitary [24] . The A-Projection algorithm uses accurate 
forward and approximate reverse transforms based on the 
primary beam operator to correct for time-variable primary 
beam effects (see [24] for details and an example of its 
application to full-beam imaging with the VLA). Apart from 
the initial setup time required to compute the antenna aperture 
function, the run time performance of this algorithm, when 
imaging the entire field of view up to the first side lobe of 
the antenna power pattern, is equivalent to that of standard 
image deconvolution algorithms using a gridding convolution 
function with a support size ~ 30% larger in linear extent. 

3) Pointing Self-Calibration: Antenna pointing errors make 
K^ y (and the resulting G dd ) different for each baseline. When 
K sky represents effects of antenna primary beams, K f- v can 

be decomposed into two antenna based terms as J? v ® J- ky , 
each parametrized for pointing errors, which can be recovered 
by solving the resulting parametrized measurement equation. 
However, iterative solvers using Kff v to represent pointing 
errors necessarily require evaluation of the integral in Eq. [3] 
in each iteration and have proved to be impractically slow. 
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An alternate approach is to solve for antenna pointing errors 
in the visibility domain, where it is efficient to compute G dd 
parametrized by pointing errors. Given a model for the sky, 
the Pointing SelfCal algorithm [26] iteratively solves for these 
pointing errors. This algorithm can be efficiently implemented 
using the forward and reverse transforms described in sec- 
tion [IV^B2] The effects of pointing errors can also be corrected 
along with other direction-dependent effects, as part of an 
iterative image deconvolution. 

C. Mosaicing 

Mosaicing observations consist of a number of independent 
pointings covering a large field of view with an adequate sam- 
pling. Instruments with focal plane arrays can be considered 
to observe a number of mosaic pointings in parallel, while 
traditional instruments observe only one pointing at a time. 
Mosaicing observations can be treated in a natural way using 
the formalism of Eqs. [T6l[T8l Every pointing of the mosaic 
corresponds to a separate G dd and The difference may 
be as little as the pointing direction (i.e. a translation of 
1^1 and phase gradient for G dd ), although more substantial 
changes are possible (e.g. for inhomogeneous arrays). Also, 
in the presence of noise Eq. [16] does not adequately constrain 
the dirty image in those parts of the sky where the weight 
1^ is low. The solution is a generalization of Eq. [16] where 
the product [I^]kl dlrtv for every pointing k is combined to 
form a linear system of equations. This is known as linear 
mosaicing. 

k 

(19) 

where the weight is given by a similar generalization of Eq.fTTI 

^ = J2[ Fi Gfwi m G d k d F) (20) 

k 

Strictly speaking, the PSF calculated as a response to a 
point source located at the centre of the mosaic (or any other 
location; but same for all pointings), is valid only for one 
particular location. For any other direction in the field of view 
the contributions of individual pointings are different, causing 
a different response. Therefore, the deconvolution performed 
in the minor cycle is always an approximate operation and 
a number of major cycles is usually required. However, this 
fact allows one to optimize the PSF calculation by taking into 
account only one pointing which contributes the most to Eq.[T9l 
(e.g. the closest pointing to the centre of the mosaic). Another 
way is to use a representative pointing and apply a phase shift 
to the convolution operator G to centre the primary beam (i.e. 
to remove the offset of this pointing with respect to the mosaic 
centre). 

This approach to mosaicing is a form of a joint deconvolu- 
tion, because the data from all pointings are combined before 
the deconvolution takes place. It was shown to be superior to 
independent deconvolution where the final image is computed 
as a weighted sum of deconvolved sub-images corresponding 
to individual pointings of the mosaic ll27l . 



V. Imaging Algorithms with advanced image 

PARAMETERISATIONS 

So far, the discussions in this paper have focused on the 
calibration and imaging of visibilities from one polarisation 
pair, the use of a pixel basis to parameterize the sky brightness 
distribution, and the assumption that source structure is con- 
stant across the entire bandwidth of data being imaged. In this 
section, we relax these assumptions and describe how standard 
methods can be augmented to handle the added complexity of 
the increased dimensionality of the parameter space. 

A. Multi-Scale CLEAN Deconvolution 

Images of astrophysical objects tend to show complex 
structure at different spatial scales. The use of a pixel-basis for 
deconvolution is ideal for fields of isolated point-like sources 
that are smaller than the instrument's angular resolution, but 
tends to break extended emission into a collection of compact 
sources, which is often inaccurate. A better choice is to 
parameterize the image in a scale-sensitive basis that spans 
the full range of scale sizes measured by the instrument. This 
provides a strong constraint on the reconstruction of visibilities 
in the null space of the measurement matrix. Also, since spatial 
correlation length fundamentally separates signal from noise, 
scale-sensitive deconvolution algorithms generally give more 
noise-like residuals 11281 . 

The Minor Cycle of the Multi-Scale CLEAN algorithm (29] 
parameterizes the image into a collection of inverted tapered 
paraboloids (hk,k — 1 : n sca i es ) whose widths are chosen 
from a predefined list. PSFs and dirty images corresponding 
to each spatial scale are calculated by smoothing l^ d%rt y >P s f} 
from Eq. [13] by each hk- Each iteration i of the Minor 
cycle follows a matched-filtering technique where the location, 
amplitude and scale of each new component is chosen from 
max{I res -k hk} (* denotes convolution) and the update 
step accounts for the non-orthogonality of the different basis 
functions hk- MS-CLEAN works very well for complicated 
spatial structure but its performance is limited by working 
with a discrete set of scale sizes, and the fact that if an 
inappropriate component is chosen it takes the addition of 
many more components to correct it. Typically, n sca i es w 8 
for a source with complex spatial structure. Multi-Resolution 
CLEAN 11301 performs a series of Hogbom Minor Cycles at 
different angular resolutions beginning at the lowest resolution 
to collect all extended emission and progressing to higher 
resolutions. PSFs and residual images at different resolutions 
are made by varying the image pixel sizes during gridding. 
Its limitations are similar to MS -CLEAN, in that there is 
no way to undo a component selection in case a better 
option becomes available later in the iterations, and is less 
robust since it searches for components one scale size at a 
time. The ASP CLEAN [28 1 algorithm parameterizes the sky 
brightness distribution into a collection of Gaussians and does 
a formal constrained optimization on their parameters. In the 
Major Cycle, visibilities are predicted analytically with high 
accuracy. In the Minor Cycle, the location of a flux component 
is chosen from the peak residual, and the parameters of 
the largest Gaussian that fits the image at that location are 
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found. The minimization proceeds over subspaces consisting 
of sets of localized Gaussians whose parameters are varied 
together. This prevents errors due to inappropriate fits from 
propagating very far into the iterations. The computing costs 
and runtimes of each Minor Cycle iteration of MS-CLEAN 
and ASP -CLEAN are a few times worse than Hogbom-CLEAN . 
However, they parameterize the sky brightness more physically 
and convergence is achieved in far fewer iterations. 

B. Multi-Frequency Synthesis Imaging 

The uv-coverage of a synthesis array can be greatly im- 
proved by using the fact that visibilities measured at different 
receiver frequencies correspond to different spatial frequen- 
cies. Multi Frequency Synthesis (MFS) is the process of 
combining data from multiple spectral channels onto the same 
spatial-frequency grid during imaging to take advantage of the 
increased uv-coverage and imaging sensitivity. As long as the 
sky brightness is the same across the total measured band- 
width, standard imaging and deconvolution algorithms can be 
used along with MFS. If the sky brightness varies across the 
observing bandwidth, the narrow-band (or monochromaticity) 
requirement of aperture synthesis breaks down and the Fourier 
relation in the Van Cittert Zernike theorem does not hold. 
The following algorithms fold a frequency dependence of the 
image sky model into the measurement equation to handle this 
problem in the Minor Cycle. 

MF-CLEAN I13T1 is a matched filtering technique based 
on spectral PSFs that describe the instrument's responses to 
point sources with spectra given by Taylor series functions 
(see Eqs. 121122b . Source spectra are modeled as a 

power law and a first order Taylor expansion of I(v) is 
combined with the regular imaging equation to describe the 
dirty image as a sum of convolutions given by l dlrt y = 
J2 t lf rty = Et C t ir * I? Sf > where are the spectral 
PSFs for t — {0, 1}. The deconvolution Minor Cycle simul- 
taneously solves for Co and C\ for each component added 
to the model image. This algorithm uses a pixel basis and 
is most suited for point sources with pure power-law spectra 
with a weak frequency dependence. MS-MF-CLEAN l32l is 
a multi-scale multi-frequency deconvolution algorithm that 
extends MF-CLEAN to work with the instrument's response 
to a polynomial spectrum (n th order Taylor series) at multiple 
spatial scales. This algorithm is suited for extended emission 
and features with non-linear spectra described by a power law 
of varying index across the observing band. 

Some direction-dependant effects in K^ v (e.g. effect of the 
Primary Beam) are also frequency dependant. Therefore the 
spectral PSFs and dirty images used in the Minor Cycle can 
be computed as another generalization of Eq. [16] 

ir fT = [/r]"MCr 1 E^ tG9CG " dt ^ t ^T]^ {corr ' 1} 

(21) 

where 

W% = W* m {{v- VQ )lv Q ) t (22) 

The weight image describes the noise variation across the 
image due to imaging weights and frequency dependant K^ y 



and is given by 

lot = E W "t=oGt d F] (23) 

C. Full polarisation Calibration and Imaging 

The preceeding sections have dealt with the calibration and 
imaging of only one correlation pair pp from a single feed. 
This section deals with the full-polarisation calibration of a 
pair of potentially imperfect orthogonal feeds, and the imaging 
of all four Stokes parameters. 

1) Full-Stokes Calibration: Each baseline measures the 
product of K^ s — jy ts (g) JJ" with the true coherence vector 
seen by that baseline. Eq. q\ becomes 

V~* r obs (" vis ~\-{/-model /"0/1\ 

4nxl — l JX 4nx4n\ v 4nxl V m ) 

and the elements of K$" are computed as described in section 
IIII-AI For a source with known polarisation characteristics, the 
true coherence vector is known (constant x [1,0,0,1] for circu- 
lar feeds and an unpolarised source) and one can form a system 
of linear equations with the elements of K^ s as unknowns. 
For a single baseline, there are up to 10 degrees of freedom 
and 4 equations 11331 . However, with an a-priori source model, 
measurements from all baselines provide enough constraints to 
uniquely factor the baseline-based K^j s matrices into antenna- 
based Jones matrices (4 x n an t{n a nt — l)/2 equations and 
4 x n ant unknowns). In its most general form, the elements 
of J" s can be computed by minimizing \ 2 = Ylij l^ij ~ 
[J" s <g> jp s *]V™\ 2 w.r. to the antenna based J" s . 

In existing software packages, polarisation calibration is 
usually done in stages. First, only the diagonal elements of 
the Jones matrices are solved for, assuming zero leakage 
between the orthogonal feeds. Solutions are then applied 
and only off-diagonal terms are solved for. Another method 
of solving for antenna based gains and leakages from only 
parallel-hand correlations pp,qq is described in [34|. The 
effects of depolarisation cannot be factored into Jones matrices 
and a baseline-based calibration is sometimes carried out by 
artificially imposing constraints between the elements of Kfj s . 

2) Full-Stokes Imaging: The Stokes vector for polarised 
sky brightness I stokes = [I } Q ; U, V} is related to the vector 
of images corresponding to the correlations {pp,pq, qp, qq] as 

fsky _ [c 1 fstokes 

i 4mxl — I°pj4mx4m-<4 m xl \ ZD J 

where S p holds a 4 x 4 linear operator per image pixel Q. 
A full-Stokes deconvolution differs from standard methods in 
the computation of dirty images and the Minor cycle. The 
Stokes vector of dirty images j dirt v, Stokes j s CO mputed by 
applying Eq. [25] to the set of dirty images in the correlation 
basis flirty, cow} given by Eqs. [l3]or[l6] The different Stokes 
parameters are considered to be linearly independent and 
deconvolution minor cycles are performed separately on each 
Stokes image. For compact sources, position constraints are 
sometimes applied across Stokes parameters based on the lo- 
cations of peak residuals of the Stokes I image. [35 1 describes 
an algorithm that applies the constraint of I 2 > Q 2 + U 2 + V 2 
during deconvolution. 
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VI. Conclusion 

We have presented a complete mathematical framework 
for describing many of the major calibration and imaging 
algorithms used in radio interferometry. This framework can 
be used for three purposes: (a) Elucidating the fundamental 
assumptions and details of algorithms, (b) Isolating the math- 
ematical structure so that standard libraries can be used, and 
(c) Allowing both generalization and specialization to generate 
new algorithms. 

The computing and software issues connected with the use 
of this framework are substantial, especially given the large 
data volumes and processing loads being contemplated for new 
radio telescopes. We will discuss these issues further in a sub- 
sequent paper. We note that this framework can also be used 
to address other algorithms not discussed here. These include 
the peeling technique for direction-dependent calibration, the 
problem of ionospheric calibration as a direction-dependent 
effect, and the excision of radio-frequency-interference from 
measured visibility data. 
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